162 research outputs found

    Exclusive Supermask Subnetwork Training for Continual Learning

    Full text link
    Continual Learning (CL) methods focus on accumulating knowledge over time while avoiding catastrophic forgetting. Recently, Wortsman et al. (2020) proposed a CL method, SupSup, which uses a randomly initialized, fixed base network (model) and finds a supermask for each new task that selectively keeps or removes each weight to produce a subnetwork. They prevent forgetting as the network weights are not being updated. Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power. Furthermore, there is no accumulation or transfer of knowledge inside the model when new tasks are learned. Hence, we propose ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and non-overlapping subnetwork weight training. This avoids conflicting updates to the shared weights by subsequent tasks to improve performance while still preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge Transfer (KKT) module that utilizes previously acquired knowledge to learn new tasks better and faster. We demonstrate that ExSSNeT outperforms strong previous methods on both NLP and Vision domains while preventing forgetting. Moreover, ExSSNeT is particularly advantageous for sparse masks that activate 2-10% of the model parameters, resulting in an average improvement of 8.3% over SupSup. Furthermore, ExSSNeT scales to a large number of tasks (100). Our code is available at https://github.com/prateeky2806/exessnet.Comment: ACL Findings 2023 (17 pages, 7 figures

    A Bayesian perspective on sampling of alternatives

    Full text link
    In this paper, we apply a Bayesian perspective to the sampling of alternatives for multinomial logit (MNL) and mixed multinomial logit (MMNL) models. A sampling of alternatives reduces the computational challenge of evaluating the denominator of the logit choice probability for large choice sets by only using a smaller subset of sampled alternatives including the chosen alternative. To correct for the resulting overestimation of the choice probability, a correction factor has to be applied. McFadden (1978) proposes a correction factor to the utility of each alternative which is based on the probability of sampling the smaller subset of alternatives and that alternative being chosen. McFadden's correction factor ensures consistency of parameter estimates under a wide range of sampling protocols. A special sampling protocol discussed by McFadden is uniform conditioning, which assigns the same sampling probability and therefore the same correction factor to each alternative in the sampled choice set. Since a constant is added to each alternative the correction factor cancels out, but consistent estimates are still obtained. Bayesian estimation is focused on describing the full posterior distributions of the parameters of interest instead of the consistency of their point estimates. We theoretically show that uniform conditioning is sufficient to minimise the loss of information from a sampling of alternatives on the parameters of interest over the full posterior distribution in Bayesian MNL models. Minimum loss of information is, however, not guaranteed for other sampling protocols. This result extends to Bayesian MMNL models estimated using the principle of data augmentation. The application of uniform conditioning, a more restrictive sampling protocol, is thus sufficient in a Bayesian estimation context to achieve finite sample properties of MNL and MMNL parameter estimates

    Indian Vehicle Ownership: Insights from Literature Review, Expert Interviews, and State-Level Model

    Get PDF
    This study reviews existing vehicle ownership models for India and describes the results of nine experts’ interviews to gather insights about Indians’ travel patterns and vehicle choices. According to the experts, vehicle price, fuel economy, and brand (in declining importance) are the most decisive factors in Indians’ car purchase choices. This study also estimated household vehicle ownership levels across India’s 35 states using Census 2011 data. The results suggest that states with a higher proportion of computer-owning households and higher share of households living in rural areas with larger household size, ceteris paribus, are likely to have higher car ownership

    A Deep Generative Model for Feasible and Diverse Population Synthesis

    Full text link
    An ideal synthetic population, a key input to activity-based models, mimics the distribution of the individual- and household-level attributes in the actual population. Since the entire population's attributes are generally unavailable, household travel survey (HTS) samples are used for population synthesis. Synthesizing population by directly sampling from HTS ignores the attribute combinations that are unobserved in the HTS samples but exist in the population, called 'sampling zeros'. A deep generative model (DGM) can potentially synthesize the sampling zeros but at the expense of generating 'structural zeros' (i.e., the infeasible attribute combinations that do not exist in the population). This study proposes a novel method to minimize structural zeros while preserving sampling zeros. Two regularizations are devised to customize the training of the DGM and applied to a generative adversarial network (GAN) and a variational autoencoder (VAE). The adopted metrics for feasibility and diversity of the synthetic population indicate the capability of generating sampling and structural zeros -- lower structural zeros and lower sampling zeros indicate the higher feasibility and the lower diversity, respectively. Results show that the proposed regularizations achieve considerable performance improvement in feasibility and diversity of the synthesized population over traditional models. The proposed VAE additionally generated 23.5% of the population ignored by the sample with 79.2% precision (i.e., 20.8% structural zeros rates), while the proposed GAN generated 18.3% of the ignored population with 89.0% precision. The proposed improvement in DGM generates a more feasible and diverse synthetic population, which is critical for the accuracy of an activity-based model
    • …
    corecore